thesaurus construction project for the persian manuscripts

نویسندگان

الهه روحی دل

دکتر ناهید بنی اقبال

چکیده

purpose: manuscripts as written works of past generations are important collections in research and university libraries in iran. they convey useful information about different subject areas. the need to this enormous amount of information emphasis on their organization and the application of the new electronic information technologies. methodology: regarding to the presence of more than 5,000 special vocabularies in the field of manuscript, it is essential to control and organize the terms according to their subject limitations in a systematic design. since, thesaurus is one of the controlling tools of words and information terms in this subject then preparing an electronic comprehensive thesaurus in addition to published version is recommended. therefore, a plan for thesaurus and its structure and the limitation of codicology terms is surveyed. finding: the result of survey for the first time in iran is a structure suggested for the thesaurus of codicology according to available samples and a brief example of thesaurus has been rendered.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Persian Text Classification and Clustering Using Persian Thesaurus

This paper proposes an innovative approach to improve the classification performance of Persian texts. The proposed method uses a thesaurus as a helpful knowledge to obtain more representative word-frequencies in the corpus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. Ex...

متن کامل

Spectral Methods for Thesaurus Construction

Traditionally, popular synonym acquisition methods are based on the distributional hypothesis, and a metric such as Jaccard coefficients is used to evaluate the similarity between the contexts of words to obtain synonyms for a query. On the other hand, when one tries to compile and clean a thesaurus, one often already has a modest number of synonym relations at hand. Could something be done wit...

متن کامل

Automatic thesaurus construction

In this paper we introduce a novel method of automating thesauri using syntactically constrained distributional similarity. With respect to syntactically conditioned cooccurrences, most popular approaches to automatic thesaurus construction simply ignore the salience of grammatical relations and effectively merge them into one united ‘context’. We distinguish semantic differences of each syntac...

متن کامل

Automatic thesaurus construction

One of the major problems of modern Information Retrieval (IR) systems is the vocabulary problem that concerns the discrepancies between terms used for describing documents and the terms used by the searchers to describe their information need. A way of handling the vocabulary problem is by using a thesaurus, which shows (usually semantic) relationships between terms. Three approaches for autom...

متن کامل

PLSI Utilization for Automatic Thesaurus Construction

When acquiring synonyms from large corpora, it is important to deal not only with such surface information as the context of the words but also their latent semantics. This paper describes how to utilize a latent semantic model PLSI to acquire synonyms automatically from large corpora. PLSI has been shown to achieve a better performance than conventional methods such as tf·idf and LSI, making i...

متن کامل

Automatic Thesaurus Construction for Information Retrieval

The Thesaurus, in information retrieval, represents one of the cardinal points of a system, since neither indexing quality nor retrieval strategy sophistication are able to remedy deficiencies of the Thesaurus used. In principle, the Thesaurus is a collection of concepts which are more or less important for the subject field of the document collection. Usually the mere list of the terms represe...

متن کامل

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023